Skip to content

Drop cache-hit lookups to DEBUG, document the healthcheck amplifier#31

Merged
jdoss merged 1 commit intomasterfrom
fix/quiet-cache-hit-logs
Apr 17, 2026
Merged

Drop cache-hit lookups to DEBUG, document the healthcheck amplifier#31
jdoss merged 1 commit intomasterfrom
fix/quiet-cache-hit-logs

Conversation

@jdoss
Copy link
Copy Markdown
Contributor

@jdoss jdoss commented Apr 17, 2026

Summary

Podman re-resolves every Secret= reference in a container's quadlet on every podman healthcheck run call — not just at container start. With a default HealthInterval=30s across ~50 containers, this generates a constant ~15+ lookups/sec against the PSI socket. The cache serves these from an in-memory dict in under a millisecond, so the throughput cost is negligible, but logging every hit at INFO floods the journal with thousands of entries per minute.

Move the cache-hit log line to DEBUG. Cache misses, provider fetches, and errors stay at INFO / WARNING / ERROR so anything interesting still surfaces.

Why

Observed on the test server after PR #29 made the cache actually work: ~16 lookups/sec sustained with all containers stable. Traced to Podman healthcheck cycles — every 30s healthcheck for every container = N secrets × M containers per interval. This is upstream Podman behavior; PSI can't opt out.

Docs

  • New paragraph in the README cache section explaining why psi serve is busy even when containers are stable.
  • New troubleshooting entry in docs/secret-cache.md with the same explanation plus how to re-enable the verbose logging if you want to see the hit rate.

Test plan

  • pytest tests/test_serve.py — all existing tests still pass (no tests assert on log level for cache hits).
  • ruff check / ty check — clean.
  • Deploy to test server, confirm cache-hit log volume drops while provider-fetch and error logs still surface at INFO/WARNING.

Podman re-resolves every Secret= reference in a container's quadlet on
every `podman healthcheck run` call, not just at container start.
With a default HealthInterval=30s across ~50 containers, this generates
a constant ~15+ lookups/sec against the PSI socket. The cache serves
these from an in-memory dict in under a millisecond — the throughput
cost is negligible — but logging every hit at INFO floods the journal
with thousands of entries per minute.

Move the cache-hit log line to DEBUG. Cache misses, provider fetches,
and errors stay at INFO / WARNING / ERROR so anything interesting still
surfaces.

Document the behavior in the README cache section and add a
troubleshooting entry in docs/secret-cache.md explaining that this is
upstream Podman behavior, not a PSI regression.
@jdoss jdoss merged commit e9510d0 into master Apr 17, 2026
2 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant